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Abstract 

The  problem  of  minimizing  a  sum  of  Euclidean  norms  dates  from 
the  17th  century  and  may  be  the  earliest  example  of  duality  in  the 
mathematical  programming  literature.  This  nonsmooth  optimization 
problem  arises  in  many  different  kinds  of  modern  scientific  applications. 

We  derive  a  primal-dual  interior-point  algorithm  for  the  problem,  by 
applying  Newton’s  method  directly  to  a  system  of  nonlinear  equations 
characterizing  primal  and  dual  feasibility  and  a  perturbed  complemen¬ 
tarity  condition.  The  main  work  at  each  step  consists  of  solving  a 
system  of  linear  equations  (the  Schur  complement  equations).  This 
Schur  complement  matrix  is  not  symmetric,  unlike  in  linear  program¬ 
ming.  We  incorporate  a  Mehrotra-type  predictor-corrector  scheme  and 
present  some  experimental  results  comparing  several  variations  of  the 
algorithm,  including,  as  one  option,  explicit  symmetrization  of  the 
Schur  complement  with  a  skew  corrector  term.  We  also  present  results 
obtained  from  a  code  implemented  to  solve  large  sparse  problems,  using 
a  symmetrized  Schur  complement.  This  has  been  applied  to  problems 
arising  in  plastic  collapse  analysis,  with  hundreds  of  thousands  of  vari¬ 
ables  and  millions  of  nonzeros  in  the  constraint  matrix.  The  algorithm 
typically  finds  accurate  solutions  in  less  than  50  iterations  and  deter¬ 
mines  physically  meaningful  solutions  previously  unobtainable. 
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1  Introduction 

A  problem  which  arises  in  many  applications  is  to  minimize  a  sum  of  Eu¬ 
clidean  vector  norms,  i.e. 

D  :  min  j  ^  ||zj||  :  y  E  5Rm;  z,  E  5Ra!;  Ajy  +  z\  =  Cj,  «  =  1, . . .  ,n 
U=l 

where  A,  E  3?mxc!,  c*  E  5Rd,  «  =  1, . . .  ,  n,  are  given.  In  most  applications 
d  =  2  or  d  =  3  so  that  the  terms  in  the  sum  are  distances  in  a  two  or 
three  dimensional  Euclidean  space.  If  d  =  1,  the  problem  D  is  equivalent 
to  a  linear  program  (LP).  The  minimization  objective  is  convex  but  not 
differentiable  at  any  point  where  some  zt  =  0. 

The  sum  of  distances  problem,  D.  has  a  long  and  interesting  history.  The 
special  case  d  =  m  =  2,  n  =  3,  Aj  =  /,  was  studied  by  Fermat  in  the  17th 
century.  This  amounts  to  finding  the  point  in  5R2  which  minimizes  the  sum 
of  distances  from  it  to  three  given  points.  In  the  early  19th  century  it  was 
realized  that  this  particular  convex  optimization  problem  has  a  natural  dual 
maximization  formulation.  Kuhn  [Kuh91]  regards  this  as  the  first  instance 
of  duality  in  the  mathematical  programming  literature.  Further  history  is 
given  in  [Kuh67]. 

Duality  theory  for  D  is  easily  described  using  min-max  theory.  Let 
Xi  e  5Rd,  i  =  1, . . .  ,  n.  For  consistency  with  standard  notation  for  LP,  we 
refer  to  Xi  as  the  primal  variables  and  y.  Zi  as  the  dual  variables,  even 
though  in  our  experience  it  is  usually  the  dual  problem  D  which  explicitly 
arises  in  applications.  We  have 
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min  N 

Afy+Zi=Ci  -=1 


min  max  V 

Afy+Zi=Ci  ||si||<l  i=1 


T 

X;  Zi 


max  min  xj  Zi 

IMI<1  Af  y-Zi-  -a  “ 


max  min  cj x;  —  yT  AjXi 

n&i  y  \f~i  h 


max  l^cjxi  :  ||a5* ||  <  1;  ^AjXi 
U=1  i-l 


The  first  equality  follows  from  Cauchy- Schwartz,  the  second  from  min-max 
theory  [Roc70,  Cor.  37.3.2],  the  third  trivially,  and  the  fourth  because  if 
Ya=i  AiXi  is  not  zero,  the  minimized  value  would  be  —  oo.  Therefore,  the 
dual  of  D  is  the  primal  problem 


P  :  max 


x. 


e 


I 


<1,  i  =  1, 


■  n: 


y '  A^i  —  o 


i= 1 


This  result  is  an  easy  generalization  of  the  duality  theory  in  [Kuh67],  but 
may  not  have  explicitly  appeared  in  the  literature  until  [And96b]. 

Although  the  duality  theory  has  been  known  in  its  simplest  form  for 
nearly  two  centuries,  it  was  not  understood  until  relatively  recently  how 
to  exploit  duality  in  algorithms  for  minimizing  D.  Iteratively  reweighted 
least  squares  (Weiszfeld’s  method  [Wei37])  has  long  been  used  as  a  robust 
though  slowly  converging  method  to  solve  D.  Another  well-known  approach 
is  to  replace  the  terms  ||zj||  in  the  objective  by  the  differentiable  quantity 
VINI2  +  /r2,  where  //  is  a  fixed  positive  number.  This  method  is  also  robust 
but  converges  arbitrarily  slowly  as  [i  — >  0.  Neither  of  these  algorithms  use 
any  aspect  of  duality.  In  both  cases,  the  reason  for  the  slow  convergence  is 
that,  in  most  interesting  applications,  some  of  the  norms  in  the  objective  D 
have  zero  as  their  optimal  value. 

Calamai  and  Conn[CC80,  CC87]  and  Overton[Ove83]  solved  D  using 
Newton  methods  combined  with  an  active  set  approach  to  determine  which 
norms  ||zj||  are  zero  at  an  optimal  solution.  These  methods  were  the  first  to 
exploit  the  duality  structure  of  the  problem,  as  they  explicitly  compute  both 
primal  and  dual  solutions.  However,  Newton’s  method  was  derived  in  the 
y.  z  space  only,  with  the  x  variables  computed  by  least-squares  estimates. 
The  methods  of  Calamai  and  Conn  and  Overton  are  quite  efficient  if  not 
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many  norms  ||zj||  are  zero.  However,  if  this  number  is  large,  the  number  of 
iterations  is  typically  also  large,  because  the  active  set  of  zero  norms  must 
be  updated  at  every  step. 

Andersen  [And96b]  gave  a  method  for  solving  D  which  is  based  on  a 
primal  interior-point  method  for  LP.  In  this  method  the  terms  ||zj||  are  re¬ 
placed  by  «?,  but  the  quantity  /i  is  treated  as  an  extra  variable, 
whose  value  is  determined  by  duality  estimates.  Using  this  method,  Ander¬ 
sen  was  the  first  to  be  able  to  solve  D  rapidly  and  accurately  even  when 
the  number  of  variables  is  large  and  many  norms  ||zj||  are  zero  at  a  solution 
point.  In  [AC98]  it  was  demonstrated  how  the  linearly  constrained  problem 
can  be  reduced  to  the  unconstrained  case  using  an  exact  l  \  penalty  function, 
while  still  preserving  sparsity  structure. 

In  this  paper  we  present  a  primal-dual  interior-point  method  for  solv¬ 
ing  P  and  D.  The  basic  algorithm  is  easy  to  motivate  and  implement. 
The  number  of  iterations  required  is  substantially  fewer  than  for  the  primal 
interior-point  method  used  by  Andersen  [And96b].  This  is  consistent  with 
general  experience  with  primal-dual  versus  primal  interior-point  methods 
for  LP  [Wri97]. 

The  sum  of  norms  problem  is  a  special  case  of  quadratically  constrained 
quadratic  programming  (QCQP),  also  known  as  optimization  over  the  quad¬ 
ratic  cone.  Nesterov  and  Todd  [NT98a,  NT98b]  give  a  theoretical  discussion 
of  algorithms  for  optimization  over  homogeneous  self-dual  cones,  including 
the  quadratic  cone.  See  also  Adler  and  Alizadeh  [AA95]  for  another  primal- 
dual  algorithmic  approach  to  QCQP.  Our  view  is  that  the  sum  of  norms 
problem  is  sufficiently  important  that  a  specialized  approach  is  justified. 
Also  taking  this  view,  Xue  and  Ye  [XY97]  give  a  complexity  analysis  of  the 
sum  of  norms  problem,  using  an  interior-point  method  and  exploiting  the 
general  theory  given  in  [NT98a,  NT98b]. 

Our  primal-dual  algorithm  is  derived  in  the  next  section,  applying  New¬ 
ton’s  method  to  three  conditions:  primal  and  dual  feasibility  and  comple¬ 
mentarity.  A  key  point  is  the  derivation  of  the  appropriate  complementarity 
condition.  The  main  work  at  each  step  consists  of  solving  a  system  of  lin¬ 
ear  equations  (the  Schur  complement  equations).  This  Schur  complement 
matrix  is  not  symmetric,  unlike  its  counterpart  in  linear  programming. 

Section  3  discusses  a  Mehrotra  predictor-corrector  enhancement  to  the 
algorithm  and  considers  symmetrizing  the  Schur  complement  equations,  in¬ 
cluding  a  compensating  skew  corrector  term.  Section  4  presents  experimen¬ 
tal  results  for  some  Steiner  tree  test  problems. 

Section  5  discusses  a  large-scale  implementation  using  a  symmetrized 
Schur  complement.  This  has  been  used  to  solve  applied  problems  arising  in 


4 


plastic  collapse  analysis  with  hundreds  of  thousands  of  variables  and  millions 
of  nonzeros  in  the  constraint  matrix.  The  algorithm  typically  finds  accurate 
solutions  in  less  than  50  iterations  and  determines  physically  meaningful 
solutions  that  were  considered  unobtainable  until  now. 

In  fact,  problem  D  arises  in  many  applications.  Alpert,  Chan,  Kahng, 
Markov  and  Mulet  [ACK+98]  have  recently  applied  a  variant  of  our  algo¬ 
rithm  presented  in  this  paper  to  the  placement  of  circuits  in  VLSI  design. 
Chan,  Golub  and  Mulet  [CGM96]  applied  a  nonlinear  version  of  the  algo¬ 
rithm  to  some  applications  in  image  reconstruction.  Byrnes  and  Bright  [BB] 
used  iteratively  reweighted  least-squares  to  solve  trajectory  optimization 
problems  in  space  exploration.  In  fact,  this  method  (Weiszfeld’s  method) 
has  long  been  used  at  the  Jet  Propulsion  Laboratory  as  a  basic  workhorse 
to  solve  problems  of  the  form  D  that  arise  in  spacecraft  missions  such  as  the 
Galileo  and  Pioneer  “fly-by’s”  of  the  outer  planets.  Strang  [Str79]  consid¬ 
ered  an  isoparametric  design  problem  to  which  Overton  [Ove84]  applied  a 
version  of  his  algorithm  mentioned  above.  Parks  [Par91]  has  applied  related 
methods  to  solve  minimal  surface  (soap  bubble)  problems.  Alexander  and 
Maddocks  [AM93]  used  the  method  of  [Ove83]  to  solve  friction  problems 
arising  in  robotics.  A  key  similarity  in  all  these  applications  is  that  some, 
and  perhaps  many,  of  the  norms  in  the  sum  to  be  minimized  can  expected 
to  have  the  value  zero  at  an  optimal  solution. 

We  believe  there  is  great  opportunity  to  apply  the  primal-dual  method 
given  in  this  paper  to  these  and  many  other  interesting  applications. 


Notation.  Let  Id  denote  the  dx  d  identity  matrix.  Let 


x  = 


'  Xl 

'  zi 

'  Cl 

eH**,  z  = 

e5R*\  c  = 

-  xn  - 

-  zn  - 

-  Cn  . 

e  5Rdn, 


A  = 


A I  •  •  •  Ar, 


E  3? 


mxdn 


The  primal  feasible  region  is  given  by 

X  =  ja;  E  %ldn  :  Ax  =  0;  ||a:j||  <1,  i  —  1, . . .  ,nj 

and  the  dual  feasible  region  is 

V=  {(y,z)  e  3T*  x  Mdn  :  ATy  +  z  =  c}  . 


(1) 

(2) 
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Consequently,  we  may  rewrite  D  as 

f  n 

D  :  min  <  ^  ||zj  || 

li  1 

and  P  as 

P  :  max  jcTa; 

2  Complementarity  and  Newton’s  Method 

Suppose  x  E  X  and  ( y ,  z)  E  y  are  respectively  primal  and  dual  feasible. 
Then  the  duality  gap,  i.e.  difference  between  the  primal  and  dual  objective 
functions,  is 

llz*ll  —  xi  ~  ^  (llz«ll  —  xi  zi )  —  0-  (3) 

t=i  t=i  t=i 

The  duality  gap  must  be  zero  at  an  optimal  solution.  It  is  zero  if  and 
only  if,  for  each  i  =  1 ,...,«,  either  \\zi\\  is  zero  or  Xi  =  Zi/\\zi\\.  This 
complementarity  condition  can  be  conveniently  expressed  as 

z,  -  \\zi\\xi  =  0,  f  =  l,...,n.  (4) 

It  follows  from  the  complementarity  condition  that  for  each  t,  either  zt  =  0 
or  ||ajj||  =  1;  we  say  that  strict  complementarity  holds  if,  for  each  i.  only  one 
of  these  two  conditions  holds.  It  may  happen  that  no  strictly  complementary 
solution  exists,  unlike  in  LP. 

Primal-dual  interior-point  methods  are  based  on  Newton’s  method  ap¬ 
plied  to  three  sets  of  equations:  primal  feasibility,  dual  feasibility,  and  an 
appropriate  complementarity /centering  condition.  The  feasibility  equations 
are  respectively 

Ax  =  0  (5) 

and 

ATy  +  z  =  c.  (6) 

We  assume  from  now  on  that  the  m  by  dn  matrix  A  has  full  rank.  We  also 
assume  that  m  <  dn ,  since  otherwise  P  and  D  are  solved  by  x  =  0,  z  =  0. 

The  primal  and  dual  feasibility  equations  consist  of  m  +  dn  equations 
in  the  m  +  2 dn  scalar  variables  represented  by  y  and  a?,  z.  To  make  this  a 
square  system  we  need  another  dn  equations,  which  are  available  in  the  form 


:  {y,z)  e  y 
:  xex\. 
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of  the  complementarity  condition  (4).  This  condition  is  not  differentiable  if 
||zj||  is  zero,  but  it  may  be  replaced  by  the  centering  condition 

Zi  ~  [\\zi\\2  +  n2y  Xi  =  0,  i  =  l,...,n  (7) 

where  p  >  0. 

The  following  theorem  is  from  [And96b],  showing  that  the  centering 
condition  (7)  is  in  fact  the  complementarity  condition  for  the  following  pair 
of  smooth  optimization  problems: 


Theorem  1  The  problems  D ^  and  P ^  are  a  primal-dual  pair.  Specifically, 
has  the  solution  (y(p),  z(p))  and  P ^  has  the  solution  x(p),  all  satisfying 
(5),  (6)  and  (7). 

Proof:  The  proof  is  a  simple  modification  of  the  proof  (given  in  Section  1) 
that  P  and  D  are  a  primal-dual  pair.  See  [And96b]  for  details.  □ 

This  theorem  shows  that  introducing  the  centering  parameter  p  in  the 
complementarity  conditions  for  the  original  pair  of  problems  is  equivalent  to 
smoothing  the  norms  in  D  and  introducing  a  cost  into  P  which  moves  the 
primal  solution  away  from  its  boundary.  The  solutions  (x(p),  y  [p],  z(p))  of 
Pji .  D/( .  for  p  >  0.  define  a  sort  of  central  path  for  P,  I),  though  not  one 
derived  from  a  logarithmic  barrier  function  and  therefore  not  centered  in 
the  usual  sense. 

Let  us  write  the  centering  condition  (7)  as 

@{p,z)x  -  z  =  0,  where  0(/r,  z)  =  Diag  ^(||zj||2  +  /r2)  2  Idj  ■  (8) 

Collecting  (6),  (5)  and  (8)  together,  we  have  the  nonlinear  system  of  equa¬ 
tions 

^  ATy  +  z  —  c  ^ 

G/n(x,  y,  z)  —  Ax  =0,  (9) 

\  ®{T,z)x  -  z 
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whose  solution  is  (x(y,),  y (y),  z(y)).  Newton’s  method  applied  to  G p  at  a 
given  point  (a;,  y,  z)  gives  the  following  linear  system  defining  updates  to 
the  variables: 


where 


and 


0  at  Idn 

Ax 

rd 

A  0  0 

Ay 

= 

rP 

_  Ep  0  —F n  _ 

A  z 

fc 

rd  =  c-ATy-z1  rp  =  -Ax ,  rc  =  z-Epx , 


Ep  =  Diag  (u>fJd) , 


E  n  =  Diag 


(10) 


(11) 

(12) 

(13) 


The  equation  rd  =  0  is  maintained  exactly  at  each  step,  by  always  defining 
z  =  c  —  ATy.  Although  rp  can  be  set  to  zero  initially  by  setting  the  first 
iterate  x  =  0  or  setting  x  to  a  null  vector  of  A,  we  do  not  assume  this,  since 
primal  feasibility  cannot  be  maintained  exactly  in  the  presence  of  rounding 
errors. 

Eliminating  A z,  we  have 


'  f-^e,  at' 

Ax 

'  F»lrc  ' 

A  0 

Ay 

—Ax 

(14) 


Defining  Hp  =  EfllFp^  and  using  the  definition  of  rc  in  (11),  we  find  after 
one  more  elimination  step  that 


AHpATAy  =  AE~xz  (15) 

and 

Ax  =  E~1(F^Az  +  rc),  (16) 

where  (immediately  from  dual  feasibility) 

A  z  =  —AT  Ay.  (17) 

The  operations  of  multiplying  vectors  by  F p  and  E~ 1  are  trivial  since  F p  is 
block  diagonal  and  Ep  is  positive  diagonal.  Notice  the  explicit  dependence 
of  Efl  and  Fjr  on  the  centering  parameter  y.  in  contrast  to  the  situation  in 
LP,  where  the  corresponding  diagonal  matrices  depend  only  on  the  current 
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variables.  This  is  a  consequence  of  the  more  complicated  nature  of  the 
complementarity  condition  (4). 

The  main  cost  of  this  process  is  forming  and  factoring  the  Schur  comple¬ 
ment  AH ^AT .  Except  in  the  trivial  case  d  =  1,  the  block  diagonal  matrix 
is  not  generally  symmetric,  since  Fjt  is  not.  This  presents  difficulties 
for  large  sparse  problems,  where  it  is  highly  desirable  to  use  sparse  Cholesky 
techniques  which  apply  only  to  symmetric,  positive  definite  systems.  Sparse 
LU  factorization  techniques  for  nonsymmetric  linear  systems  are  more  costly 
since  they  require  pivoting  for  stability  and  are  therefore  not  able  to  exploit 
sparsity  as  effectively  as  sparse  Cholesky  methods. 

However,  note  that  F ^  is  positive  definite  (in  the  sense  that  vT F^v  >  0 
for  all  v  /  0)  as  long  as  x  <E  X.  and  therefore  so  are  H  t,  and  AH A1 
(since  E^  and  Fjr  commute).  Furthermore,  it  follows  from  equation  (7) 
that  for  (a;,y,z)  =  (a?(/r), y (/r), z(/r))  (see  Theorem  1),  the  matrix  F ^  and 
therefore  also  Hfl  and  AH /t  A1  are  symmetric  for  all  /i  >  0.  Consequently, 
when  defined  sufficiently  close  to  the  “central  path”,  AiT^AT  is  nearly 
symmetric.  One  of  the  issues  we  shall  discuss  in  the  next  section  is  the 
effect  of  symmetrizing  Hjs  by  defining  it  to  be  \E~ 1  (F lt  +  Fjt )  instead  of 
E~XF,. 

As  n  — >  0,  for  each  i,  either  Zi(y)  or  Xi(y)  —  Zi(y) /\\zi(y)  ||  converges  to 
zero.  In  the  the  latter  case,  the  limit  of  the  fth  block  of  the  corresponding 
F t,  is  singular,  while  in  the  former  case  the  limit  of  the  z'th  block  of  the 
corresponding  Ejr  is  zero. 

We  now  discuss  how  to  update  the  iterates  a;,  y  and  z  after  Aa;,  Ay  and 
Az  are  computed.  We  start  by  observing  that  Az  is  a  descent  direction  for 
the  smoothed  dual  objective  function  in  D^:  the  gradient  of  this  function 
with  respect  to  z  is  easily  seen  to  be  E~lz.  Using  (17)  and  (15),  we  have 

(Azfi^z  =  -(A  yfAE-'z 

=  -  {Ay)T  AH ^AT Ay 

<  0 

(unless  z  =  0),  since  the  symmetric  part  of  AH^AT  is  positive  definite. 
Consequently,  it  is  natural  to  update  y  and  z  by  using  a  line  search  on  the 
smoothed  dual  function  in  Zf/(.  We  therefore  update  the  dual  variables  by 

Dual  Line  Search  Rule 

y  =  y  +  PAy,  z  =  z  +  /3Az,  (18) 
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where 


n  j_ 

P  ~  argmin^  (||zj  +  /3Azj||2  +  //2)2  (19) 

—  —  i=  1 

The  same  steplength  must  be  used  for  y  and  z  to  maintain  dual  feasibility. 
Of  course,  the  univariate  minimization  problem  need  not  be  solved  exactly. 

The  direction  Ax  is  not  necessarily  an  ascent  direction  for  the  penalized 
primal  objective  function  in  P^,  so  a  line  search  is  not  appropriate  to  update 
the  primal  iterate  x.  We  consider  two  possibilities: 

Primal  Scaling  Rule 

x  =  7  (x  +  Ax)  (20) 

where 

7  =  max {7:  7||ajj  +  Aajj||  <  1  *  =  l,...,n}.  (21) 

The  scaling  rule  is  a  trivial  computation. 

Primal  Steplength  Rule  The  step  to  the  boundary  is  given  by 

ttmax  =  max  {a  :  \\xi  +  aA®j||  <  1,  i  —  1, . . .  ,  n}  =  ma xcq, 

i 

where  a,  is  the  positive  root  of  a  quadratic  equation: 


We  choose  0<t<1  and  define 

OL  =  TG!max.  (22) 

Then  the  steplength  rule  is  defined  by 

x  —  x  +  min(l,  a)  Ax.  (23) 

Both  rules  preserve  primal  feasibility  in  exact  arithmetic.  For  the  step- 
length  rule,  conventional  experience  with  primal-dual  interior-point  methods 
dictates  a  choice  of  r  less  than  1,  but  not  much  less,  for  example,  r  =  .99  or 
r  =  .999.  For  sums  of  norms,  however,  we  found  that  r  =  1  also  works  quite 
well.  This  allows  iterates  x  to  actually  reach  the  boundary  of  the  feasible 
region,  but  the  matrix  Fjr  still  cannot  be  singular  as  long  as  /i  >  0.  Increased 
ill-conditioning  of  the  linear  systems  which  are  solved  as  convergence  takes 
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place  is  a  standard  feature  of  interior-point  methods  and  generally  does  not 
cause  great  difficulties  except  when  the  iterates  are  nearly  optimal.  The 
reason  we  do  not  allow  r  =  1  is  that  rounding  errors  may  then  cause  the 
updated  x  to  lie  just  outside  the  feasible  region. 

The  scaling  rule  always  places  x  exactly  on  the  boundary  of  the  feasible 
region.  This  is  not  appropriate  if  the  solution  x  has  all  component  norms 
||x,||  <  1,  but  this  is  a  trivial  case  since  then  the  dual  solution  must  be  zero 
by  complementarity.  As  far  as  we  know,  the  scaling  rule  does  not  have  an 
analogy  in  standard  interior-point  implementations:  such  a  rule  is  possible 
only  when  the  primal  equality  constraints  are  homogeneous  as  they  are  here. 

Equations  (15),  (16),  (17)  and  the  updating  rules  just  described  define 
the  basic  ingredients  of  a  primal-dual  interior-point  method  for  solving  P 
and  D.  To  complete  the  description  of  the  algorithm  we  must  define  a  rule 
for  updating  the  parameter  ji:  for  this  we  introduce  a  predictor-corrector 
method. 


3  Mehrotra’s  Predictor-Corrector  Method  and  a 
Symmetrized  Algorithm  with  a  Skew  Correction 
Term 

Mehrotra’s  predictor-corrector  method  is  a  standard  tool  in  primal-dual 
interior-point  software  for  LP.  The  basic  idea  is  to  first  compute  a  predic¬ 
tor  step  defined  by  first-order  approximations  to  the  optimality  conditions 
(i.e.  Newton’s  method),  and  to  follow  this  with  a  corrector  step  which  also 
takes  second-order  terms  into  account.  A  key  point  is  that  both  predictor 
and  corrector  use  the  same  matrix  factorization;  only  the  right-hand  sides 
of  the  linear  equations  defining  the  steps  differ.  Another  key  component  is  a 
technique  for  estimating  the  centering  parameter  //.  Mehrotra’s  method  was 
originally  given  in  [Meh92];  an  excellent  discussion  may  be  found  in  [Wri97, 
Chap.  10]. 

We  now  discuss  how  to  adapt  Mehrotra’s  method  to  our  problem.  Let 
us  replace  a;,  y  and  z  in  the  centering  condition  (8)  by  x  +  Aa;,  y  +  Ay  and 
z  +  Az  respectively,  obtaining,  for  i  =  1, . . . ,  n, 

Zi  +  A Zi  -  (|| Zi  +  Azi\\2  +  /r2)  2  (xi  +  As,)  =  0, 
i.e. 

zi  +  A  zi  -  <4  ^1  +  )  (*i  +  Aa*)  =  0, 
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which  gives 


Zi  +  Azi  -wf  1  + 


A  Zi  ||  Azj||2  (zjAzif 


+ 


K)2  2(cf) 


VY2 


2(o;f)4 


H -  (*j  +  A»j)  =  0. 


Moving  first-order  terms  to  the  left-hand  side,  constant  and  second-order 
terms  to  the  right-hand  side,  neglecting  higher  than  second-order  terms  and 
changing  the  sign  of  the  equation  we  obtain 


(Eu  Ax-FflAz)i  =  {rc)i 


s  J  Az, 


Axi 


\\Azj\ 


(4  A  zt)2 

2(4)3  ? 


(24) 


for  i  =  1, . . .  ,n.  The  idea,  then,  is  to  compute  the  predictor  steps  Ax,  Ay 
and  Az  from  equations  (15)— (17),  and  then  use  these  to  define  the  second- 
order  terms  which  are  included  in  the  right-hand  side  of  the  linear  system 
solved  to  obtain  the  corrector  step,  using  the  factorization  of  AH ^AT  a 
second  time. 

As  noted  above,  a  key  component  of  Mehrotra’s  method  is  to  exploit 
the  result  of  the  predictor  step  to  define  a  heuristic  value  for  the  centering 
parameter  y  to  be  used  in  the  computation  of  the  corrector  step.  This  is 
provided  by 


(gap(a;  +  Ax,  z  +  Az))3 
n(gap(a;,z))2 


(25) 


Zi  -  Xj  Zi  , 


(26) 


where 

n 

gap  [x,z]  = 

i= 1 

This  is  a  natural  generalization  of  Mehrotra’s  formula  for  LP.  The  value  y 
may  be  substituted  for  y  in  all  the  terms  on  the  right-hand  side  of  (24), 
including  the  second-order  terms  as  well  as  the  constant  term,  giving 


=  zi  -  Jlx,  -  ^flAxi  - 


| Azj||2 _  i  (zfAzi)2^ 

~Xi  T  r,  „  Xi, 


2o/ 


(27) 


where 

4  =  (\\zif  +  4)2  ■  (28) 

It  is  not  practical  to  substitute  y  for  y  on  the  left-hand  side  of  (24), 
since  the  factorization  of  Hjr  has  already  been  computed  using  the  previous 
value  for  y.  Consequently,  we  also  add  to  the  right-hand  side  of  (24)  further 
corrector  terms  of  the  form 


(29) 
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As  noted  in  the  previous  section,  the  nonsymmetry  of  H is  a  major  dis¬ 
advantage  for  large  sparse  problems.  We  therefore  consider  here  the  idea 
of  explicitly  symmetrizing  H t,.  defining  it  to  be  \E~ 1  (F lt  +  Fjf)  instead 
of  E~lF p.  This  suggests  subtracting  a  skew  correction  term  ^E~l{F ^  — 
Fjt  )Az  from  the  right  hand  side,  i.e.  adding  terms 

^■3)  =  ((A zjxi)zi  -  (A (30) 

to  (24).  Note  the  use  of  wf,  not  wf,  in  the  denominator.  Putting  all  this 
together,  the  corrector  step  is  defined  by 


Efj,  Ax  —  F n  Az  =  r% 


(31) 


where  the  ith  block  of  the  “corrected  centering”  residual  is 


h^  +  h™  if  H^E-'Fp 

hV  +  hW+hV  if  H^\E-y(F,  +  Fl) 


(32) 


using  (27),  (29)  and  (30). 

Substituting  rcc  for  rc  in  (14),  we  therefore  compute  the  corrector  steps 


from 

AH^ATAy  =  A  ( E~lrcc  +  x) 

(33) 

and 

A x  =  E~l  ( F,Az  +  rcc) 

(34) 

with  Az  given  by  (17). 

Note  that  by  analogy  with  standard  practice  in  LP,  it  might  seem  ap¬ 
propriate  to  modify  the  right-hand  side  rc  used  by  the  predictor  step  by 
substituting  0  for  /i  in  its  definition.1  In  practice,  whether  or  not  this  is 
done  seems  to  have  little  effect,  but  one  reason  not  to  make  this  choice  is 
that  then  the  dual  predictor  step  is  no  longer  guaranteed  to  be  a  descent 
direction  for  the  smoothed  objective  function  in  H/(.  There  is  no  guarantee 
that  the  dual  corrector  step  is  a  descent  direction  for  either  this  function  or 
the  corresponding  function  defined  using  fi  instead  of  ji.  although  it  usually 
is.  If  it  is  a  descent  direction  for  the  latter  function,  we  update  the  iterates 
as  before,  using  fi  instead  of  ji  in  the  objective  function  in  the  dual  line 
search.  Otherwise,  we  abandon  both  primal  and  dual  corrector  steps  and 
use  the  predictor  steps  instead. 

1In  fact,  this  was  done  in  the  experiments  reported  in  Section  5. 
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We  now  summarize  the  algorithm.  We  initialize  it  with  x  =  0,  y  set  to 
the  minimizer  of  ||c  —  ATy  ||  and  z  =  c  —  ATy.  Assume  that  an  initial  value 
of  y  >  0  is  given,  as  well  as  a  termination  tolerance  e. 

Predictor-Corrector  Algorithm  for  Minimizing  a  Sum  of  Norms 

1.  Define  wf ,  E ^  and  F ^  by  (12)  (13) . 

2.  Define  H ^  by  either  E~lF^  or  ^E~l{F ^  +  F'jt)  and  find  either  the 
LU  or  Cholesky  factorization,  respectively,  of  AH ^  A1.  In  the  former 
case,  quit  if  the  LU  factorization  generates  a  zero  pivot.  In  the  lat¬ 
ter  case,  either  quit  if  the  Cholesky  factorization  fails,  or  modify  the 
factorization,  effectively  redefining  H by  a  nearby  positive  definite 
approximation. 

3.  Determine  the  predictor  steps  Ay,  A z  and  Ax  from  (15),  (17)  and 
(16). 

4.  Define  x  from  the  primal  scaling  rule  ((20)  and  (21))  or  the  primal 
steplength  rule  ((22)  and  (23)),  and  y.  z  by  the  dual  line  search  rule 
((18)  and  (19)).  Quit  if  the  dual  line  search  fails  to  achieve  a  reduction 
in  the  smoothed  dual. 

5.  Define  y  by  (25)  and  wf  by  (28). 

6.  Determine  the  corrector  steps  Ay,  Az  and  Ax  from  (33),  (17),  (34). 

7.  If  {E~^lz)T  Az  <  0,  redetermine  ay  y  and  z  using  the  primal  scaling 
or  steplength  rule  with  y  instead  of  y,  and  the  dual  line  search  rule 
with  y  instead  of  y;  quit  if  the  dual  line  search  rule  fails  to  achieve  a 
reduction  in  the  smoothed  dual  objective. 

8.  Replace  ay  y,  z  and  y  bv  ay  y.  z  and  y  respectively. 

9.  If  ||Ax||  >  gap(ayz),  quit.  (This  cannot  happen  in  exact  arithmetic 
and  indicates  that  rounding  errors  will  dominate  any  further  compu¬ 
tation.) 

10.  Ifgap(ayz)  <  e  and  ||c—  ATy  —  z\\  +  ||Aa:||  <  e,  quit;  otherwise  repeat. 

There  are  several  ways  the  algorithm  might  terminate  when  rounding 
errors  prevent  further  progress:  breakdown  of  the  factorization,  failure  in 
the  line  search,  or  growth  in  the  primal  infeasibility  ||A®||  with  respect  to 
the  duality  gap  measure  gap  (ay  z).  The  occurrence  of  any  of  these  conditions 
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essentially  indicates  that  the  convergence  tolerance  e  is  set  too  small;  in  any 
case,  when  they  occur,  the  current  or  previous  approximation  is  generally 
quite  accurate. 

4  Experiments  on  Small  Problems 

We  now  report  some  numerical  results  for  this  algorithm,  comparing  the 
symmetrized  and  nonsymmetrized  versions  and  other  algorithmic  options 
described  above.  These  results  were  obtained  using  a  Matlab  implementa¬ 
tion  run  on  a  set  of  small  topologically-constrained  Steiner  tree  test  problems 
(for  more  details,  see  [D098]).  The  sparsity  in  the  data  is  determined  by 
the  tree  structure  and  its  topological  constraint,  but  subject  to  these  qual¬ 
ifications,  the  data  are  generated  randomly.  Each  table  shows  a  summary 
of  results  from  many  runs  with  different  random  data  on  the  same  problem 
class.  Sparsity  was  not  exploited.  In  all  cases  d  =  2.  The  dual  line  search 
was  performed  using  the  Matlab  fmin  function  with  its  default  tolerance. 
The  machine  used  was  a  Sparc  Ultra  with  IEEE  double  precision  arithmetic. 

The  tables  show,  for  various  cases,  the  number  of  iterations,  the  final 
values  of  gap (x,z)  (defined  in  (26))  and  the  infeasibility  norm  sum  ||Ax||  + 

1 1 c  —  ATy  —  z ||,  each  as  medians  over  a  set  of  randomly  generated  problems 
in  a  given  class.  The  termination  tolerance  e  was  set  to  10-10. 

In  Tables  1  and  2,  we  consider  a  class  of  Steiner  tree  problems  with 
n  =  50,  m  =  62,  for  which  strict  complementarity  holds  at  the  solution,  and 
for  which  the  median  number  of  indices  for  which  ||zj||  =  0  at  the  optimal 
solution  is  15.  We  compare  the  nonsymmetric  and  symmetrized  variants 
of  the  algorithm  (H^  =  E~lF ^  and  H ^  ^ E~l{F ^  +  F^)  respectively), 
with  two  choices  for  updating  the  primal  variable:  the  Scaling  Rule  and  the 
Steplength  Rule  with  r  =  0.999.  All  variants  used  the  Dual  Line  Search 
Rule.  For  the  symmetrized  algorithm,  we  tested  both  a  version  which  quits 
if  the  Cholesky  factorization  of  Hjr  fails,  and  one  that  modifies  the  factor¬ 
ization  and  continues  iterating:  the  latter  is  standard  practice  in  LP  [Wri97, 

p.219].  We  also  tested  a  variant  of  the  symmetrized  version  which  omits 

(3) 

the  skew  correction  h)  .  Finally,  we  also  tested  the  effect  of  omitting  the 

(2) 

correction  h\  ; ,  but  this  had  essentially  no  effect  in  any  case. 

Table  1  shows  the  results  for  the  Primal  Scaling  Rule  and  Table  2  shows 
the  results  using  the  Primal  Steplength  Rule.  The  notations  “skew  corr”  and 
“mod  Choi”  refer  to  the  use  of  the  skew  correction  term  and  the  modified 
Cholesky  factorization  respectively. 

The  results  clearly  confirm  three  remarkable  properties  of  primal-dual 
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Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

8 

le-  12 

le  -  12 

Symmetrized 

8 

le-06 

6e-  13 

Symmetrized,  skew  corr 

7 

le  —  08 

7e-  13 

Symmetrized,  mod  Choi 

15 

5e-  11 

le  -  12 

Symmetrized,  skew  corr,  mod  Choi 

9 

2e-  11 

4e-  12 

Table  1:  Summary  of  Results  for  the  Scaling  Rule 


Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

9 

4e-  12 

6e-  13 

Symmetrized 

11 

oo 

o 

I 

oo 

2e-  13 

Symmetrized,  skew  corr 

9 

4e  —  09 

le  -  14 

Symmetrized,  mod  Choi 

15 

4e-  11 

8e-  13 

Symmetrized,  skew  corr,  mod  Choi 

10 

le  -  11 

4e-  13 

Table  2:  Summary  of  Results  for  the  Steplength  Rule 


predictor-corrector  algorithms  now  well  known  for  linear  programming: 

•  Robust  convergence  to  an  optimal  solution  in  all  cases  tested 

•  Rapid  local  convergence  so  a  consistently  small  number  of  iterations 
is  required  despite  the  demand  for  high  accuracy 

•  Highly  accurate  solutions  achieved  despite  the  extremely  ill-condi¬ 
tioned  linear  systems  being  solved  towards  the  end  of  the  solution 
process 

We  now  comment  in  more  detail  on  the  results  in  Tables  1  and  2.  First, 
notice  the  high  accuracy  achieved  by  the  nonsymmetric  version  of  the  algo¬ 
rithm;  the  symmetrized  version  without  the  modified  Cholesky  factorization 
cannot  reach  the  same  level  of  accuracy.  With  modified  Cholesky,  high  ac¬ 
curacy  is  achievable,  but  more  iterations  are  required.  The  inclusion  of  the 

(3) 

skew  correction  term  h\  '  substantially  improves  the  performance  of  the 
symmetrized  version  of  the  algorithm  whether  or  not  the  Cholesky  factor¬ 
ization  is  modified. 

For  both  the  nonsymmetric  and  symmetrized  versions  the  Primal  Scaling 
Rule  has  a  slightly  lower  iteration  count  than  the  Primal  Steplength  Rule, 
apparently  because  this  version  of  the  algorithm  has  a  somewhat  faster  local 
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Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

7 

2e-  13 

7e-  15 

Symmetrized 

16 

6e-  11 

7e-  15 

Symmetrized,  skew  corr 

10 

2e-  11 

8e-  15 

Symmetrized,  mod  Choi 

16 

6e-  11 

7e-  15 

Symmetrized,  skew  corr,  mod  Choi 

10 

2e-  11 

8e-  15 

Table  3:  Results  for  the  Scaling  Rule  on  the  20  Chung-Graham  ladder  prob¬ 
lems  with  strictly  complementary  solutions 


Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

9 

2e-  12 

7e-  15 

Symmetrized 

16 

6e-  11 

8e-  15 

Symmetrized,  skew  corr 

10 

7e-  12 

8e-  15 

Symmetrized,  mod  Choi 

16 

6e-  11 

8e-  15 

Symmetrized,  skew  corr,  mod  Choi 

10 

7e-  12 

8e-  15 

Table  4:  Results  for  the  Steplength  Rule  on  the  20  Chung-Graham  ladder 
problems  with  strictly  complementary  solutions 


convergence  rate.  However,  we  note  that  the  Scaling  rule  has  the  significant 
disadvantage  that  it  is  not  applicable  if  nonhomogeneous  linear  constraints 
are  added  to  the  problem. 

In  Tables  3  through  6  we  display  results  for  a  different  class  of  topologi¬ 
cally  constrained  Steiner  tree  examples,  based  on  the  Chung-Graham  ladder 
problem  (see  [D098]).  For  these  examples,  n  =  85,  m  =  84,  and  the  me¬ 
dian  number  of  indices  for  which  ||zj||  equals  0  at  the  optimal  solution  is  10. 
For  most  of  these  problems,  no  strictly  complementary  (SC)  solution  exists. 
(Recall  from  Section  2  that  a  solution  is  said  to  be  strictly  complementary 
if,  for  each  i,  exactly  one  of  the  conditions  | z,  \  =  0  and  ||x*||  =  1  holds.) 

Tables  3  and  4  show  results  for  the  20  cases  out  of  200  generated  where 
an  SC  solution  is  found  (using  the  Primal  Scaling  and  Primal  Steplength 
Rules  respectively),  while  Tables  5  and  6  show  results  for  the  other  180  cases 
where  no  SC  solution  is  found,  presumably  because  such  a  solution  does  not 
exist.  The  algorithm  achieves  the  same  accuracy  (by  the  duality  gap  and 
feasibility  measures)  on  the  SC  and  non-SC  problems,  but  the  iteration 
count  is  markedly  higher  in  the  non-SC  case,  and  the  rate  of  convergence  of 
the  algorithm  was  observed  to  be  slower  in  the  non-SC  case.  For  the  non- 
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Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

14 

3e-  11 

8e-  15 

Symmetrized 

16 

be-  11 

8e-  15 

Symmetrized,  skew  corr 

14 

4e-  11 

8e-  15 

Symmetrized,  mod  Choi 

16 

5e-  11 

8e-  15 

Symmetrized,  skew  corr,  mod  Choi 

14 

4e-  11 

8e-  15 

Table  5:  Results  for  the  Scaling  Rule  on  the  180  Chung-Graham  ladder 
problems  with  NO  strictly  complementary  solution 


Version 

iter 

(median) 

gap 

(median) 

infeas 

(median) 

Not  symmetrized 

16 

3e-  11 

8e-  15 

Symmetrized 

22 

be-  11 

8e-  15 

Symmetrized,  skew  corr 

17 

3e-  11 

8e-  15 

Symmetrized,  mod  Choi 

22 

be-  11 

8e-  15 

Symmetrized,  skew  corr,  mod  Choi 

17 

3e-  11 

8e-  15 

Table  6:  Results  for  the  Steplength  Rule  on  the  180  Chung-Graham  ladder 
problems  with  NO  strictly  complementary  solution 


SC  problems,  the  residuals  ||zj||  are  not  reduced  nearly  as  close  to  zero  for 
indices  i  for  which  SC  does  not  hold.  The  reason  for  this  is  that  the  duality 
gap  tolerance  requires  the  products  zj (zj/||zj||  —  Xi)  to  be  small  and  both 
factors  in  the  product  for  such  an  index  i  converge  to  zero  as  the  solution  is 
approached. 

For  these  problems,  the  modified  Cholesky  factorization  is  not  needed: 
the  results  are  identical  whether  or  not  it  is  used. 

On  the  basis  of  the  experiments  reported  in  this  section,  we  recommend 
the  symmetrized  version  of  the  algorithm  with  the  skew  correction  term 
and  the  modified  Cholesky  factorization,  using  the  Dual  Line  Search  and 
either  the  Primal  Scaling  or  the  Primal  Steplength  Rule.  The  choice  of  the 
symmetrized  version  is  based  on  the  substantial  advantage  of  being  able  to 
use  the  Cholesky  factorization  instead  of  the  LU  factorization. 
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5  Large  Sparse  Problems  arising  in  Plastic  Collapse 
Analysis 


A  variant  of  the  algorithm  described  above  has  been  used  to  solve  some  chal¬ 
lenging  large  sparse  problems  arising  in  plastic  collapse  analysis.  We  used 
a  symmetrized  version  of  the  algorithm,  with  H ^  +  Fjt),  so 

that  the  Schur  complement  AH^A1  can  be  factored  by  Cholesky  decompo¬ 
sition,  modified  to  ensure  positive  definiteness  as  discussed  earlier.  This  is 
the  primary  cost  of  the  algorithm.  Details  of  this  and  other  numerical  linear 
algebra  issues  are  available  in  [AA97],  [AY97]  and  [And96a].  The  primal 
steplength  rule  was  used,  with  r  =  0.99  in  equation  (22). 

This  sparse  implementation  was  developed  over  several  years  with  large- 
scale  applications  in  mind.  There  are  two  primary  differences  from  the 
algorithm  discussed  in  Section  3.  The  first  is  that  a  different  generaliza¬ 
tion  of  Mehrotra’s  method  was  used,  based  on  differentiating  a  form  of  the 
centering  condition  which  incorporates  the  symmetrization  of  F ^  directly, 
and  therefore  does  not  require  a  skew  correction  term.  The  second  is  that 
individual  centering  parameters  were  used  instead  of  one  parameter,  namely 


fit  —  ^ 


H, 


fi  (f 
{  Vv, 


if  0.25  <  (l  —  ||*i||2) 

if  n  <  (l  —  ||aii||2)  <  0.25 
if  (l  -  ||£j||2)  <  n 


for  i  =  1, . . . .  n.  This  modification  was  found  to  give  significant  improve¬ 
ments  in  performance  for  the  large-scale  problems. 

The  first  three  classes  of  test  problems  are  taken  from  [And96b],  where 
a  primal  barrier  method  was  used.  We  are  unaware  of  any  other  published 
results  for  large  sparse  problems  of  the  form  ( D ).  These  test  problems  are 
finite  dimensional  discretizations  of  collapse  problems  in  rigid  plasticity.  The 
discretization  step  and  the  physical  interpretation  of  the  results  can  be  found 
in  [Chr96]  and  in  [AC098].  The  discrete  optimization  problems  are  the  same 
as  in  [And96b].  The  m  by  dn  matrix  A  is  a  typical  finite  element  matrix 
which  in  plastic  analysis  is  not  square  since  the  equilibrium  equation  for  the 
continuum  is  under-determined.  As  earlier,  H is  block  diagonal  with  block 
size  dx  d.  In  the  cases  reported  here  d  is  either  2  or  3.  The  runs  were  made 
on  the  same  Convex  3240  vector  machine  (using  IEEE-compatible  double 
precision)  as  in  [And96b]  so  comparisons  of  accuracy  and  CPU  time  are 
meaningful. 

In  the  tables,  n  and  m  specify  the  problem  dimensions  while  |A|,  \AH^AT 


19 


and  \L  \  respectively  denote  the  number  of  nonzeros  in  A,  the  upper  triangle 
of  AH mAt  and  the  Cholesky  factor  L  of  AH lt  Ar.  These  numbers  are  the 
same  as  in  [And96b],  except  for  small  variations  in  sparsity  due  to  improve¬ 
ments  in  the  implementation.  The  iteration  count  is  denoted  by  “iter”  and 
“cpu”  is  the  CPU  time  in  seconds.  The  heading  “||zj||  =  0”  indicates  the 
number  of  norms  in  the  dual  objective  that  are  zero  at  the  optimal  solu¬ 
tion.  More  precisely,  ||zj||  is  interpreted  as  being  zero  if  it  is  less  than  the 
tolerance  10-10.  The  heading  “relgap”  denotes  the  relative  duality  gap 

£"=  i  INI  -  cTx 
ENINI  +  i  ' 

In  addition  to  being  scaled  this  is  a  slightly  different  measure  than  the 
complementarity  defined  in  (26):  if  Ax  =  0  exactly  we  have 

cTx  =  ( ATy  +  z)Tx  =  xTz. 

Hence  the  difference  is  dominated  by  the  primal  infeasibility  ||  Ax\\  indicated 
in  the  last  column. 


N 

n 

m 

\A\ 

\AH  p  A1' \ 

\L\ 

4 

25 

15 

224 

97 

105 

10 

121 

99 

1760 

1010 

1785 

50 

2601 

2499 

48800 

31010 

171083 

100 

10201 

9999 

197600 

127010 

929515 

300 

90601 

89999 

1792800 

1161010 

13203975 

400 

160801 

159999 

3190400 

2068010 

31011299 

N 

iter 

cpu 

N  =  ° 

relgap 

Ax  || 

4 

7 

0 

0 

2e  — 

10 

2e 

-  14 

10 

8 

1 

0 

2e  — 

10 

7e 

-  15 

50 

9 

65 

0 

2e  — 

09 

4e 

-  13 

100 

10 

354 

0 

le  — 

09 

9e 

-  14 

300 

11 

6709 

0 

le  — 

09 

2e 

-  13 

400 

12 

22139 

0 

2e  — 

10 

2e 

-  13 

Table  7:  Problem  and  solution  characteristics  for  sspiV 


The  first  set  of  problems  is  denoted  sspiV  (simply  supported  plate  with 
a  point  load  solved  on  an  N  x  N  grid).  They  all  have  the  same  structure, 
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but  vary  in  size,  depending  on  the  grid  in  the  finite  element  analysis.  In 
this  problem  d  =  3.  This  set  of  problems  is  characterized  by  having  no  zero 
norms  in  the  solution,  i.e. ,  they  are,  in  fact,  smooth  optimization  problems. 
In  [And96b]  the  constraints  |  xt  \  <  1  are  satisfied  within  a  tolerance  of  order 
10-9.  These  constraints  are  satisfied  exactly  in  the  primal-dual  method. 
Except  for  this  improvement  in  accuracy,  the  primal-dual  method  shows  no 
significant  difference,  for  these  problems,  compared  to  the  primal  barrier 
method  in  [And96b].  There  is  a  small  reduction  in  the  iteration  count,  but 
not  in  the  CPU  time.  This  is  clearly  a  consequence  of  the  fact  that  these 
problems  are  smooth. 

For  this  problem,  as  well  as  for  the  other  results  reported  below,  there 
is  no  significant  difference  in  the  final  duality  gap  and  primal  infeasibility 
achieved  by  the  two  algorithms.  They  are,  in  all  cases,  about  10-8  or  less. 

The  second  set  of  problems,  denoted  by  IN al3,  arises  in  the  plane  strain 
model  in  plasticity.  Again  N  indicates  the  grid  size.  In  these  problems 
d  =  2.  Characteristics  and  results  are  given  in  Table  8. 


N 

n 

m 

\A.\ 

\AH^Ar\ 

\L\ 

3 

49 

52 

1390 

1142 

1207 

12 

625 

640 

21331 

26406 

57421 

21 

1849 

1876 

64762 

84384 

278691 

30 

3721 

3760 

131683 

175086 

726046 

60 

14641 

14720 

524403 

713766 

4337857 

99 

39601 

39732 

1425134 

1957631 

14693151 

120 

58081 

58240 

2092843 

2881926 

24202413 

N 

iter 

cpu 

\\zi  |  =  0 

relgap 

Arc 

3 

14 

1 

12 

be-  10 

2e 

-  12 

12 

19 

34 

179 

2e  —  09 

4e 

-  12 

21 

24 

200 

958 

4e-  10 

4e 

-  12 

30 

24 

461 

2315 

7e  —  09 

3e 

-  11 

60 

28 

4710 

11265 

4e  —  09 

le 

-  10 

99 

30 

17937 

33503 

7e  —  09 

9e 

-  10 

120 

34 

44144 

50548 

8e  —  09 

4e 

-  10 

Table  8:  Problem  and  solution  characteristics  for  Z  A/a  13 

In  this  problem  set,  the  number  of  zero  norms  varies  from  25  percent 
for  N  =  3,  to  87  percent  for  N  =  120.  Compared  with  [And96b],  there  is 
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a  significant  reduction  in  the  number  of  iterations  and  in  CPU  time.  If  we 
disregard  the  smallest  case,  N  =  3,  the  iteration  count  for  the  primal-dual 
method  varies  from  19  to  34;  for  the  primal  barrier  method  in  [And96b] 
the  variation  is  from  33  to  176.  The  primal-dual  algorithm  also  obtains 
significantly  more  zero  norms  in  the  optimal  solution.  From  our  physical 
understanding  of  the  solution  we  believe  this  is  correct.  It  is  one  of  several 
indications  that  the  primal-dual  method  is  more  accurate  than  the  primal 
barrier  method. 

There  is  an  important  physical  interpretation  of  the  complementarity 
condition  (4)  in  the  plasticity  problems  considered  in  this  section:  the  vec¬ 
tors  Zi  represent  the  deformation  (strain)  tensor  at  discrete  points  in  the 
continuum  while  the  Xi  represent  the  stresses.  Thus,  if  there  is  any  de¬ 
formation  at  a  point,  then  the  stresses  at  that  point  are  on  their  bounds 
(have  norm  one)  and  their  directions  are  determined  by  the  complementar¬ 
ity  condition.  With  this  interpretation  the  complementarity  condition  is  the 
so-called  “how  rule”  for  the  material. 

In  the  third  set  of  test  problems,  only  a  small  part  of  the  material  under¬ 
goes  deformation;  therefore  a  very  large  number  of  the  norms  are  expected 
to  be  zero  in  the  optimal  solution.  As  shown  in  Table  9,  the  number  of  zero 
norms  varies  from  62  to  96  percent  of  the  total  number  of  terms.  Compared 
with  [And96b]  the  iteration  count  is  significantly  reduced  and  increases  very 
slowly  with  the  problem  size.  The  CPU  time  is  reduced  by  a  factor  4  or 
more,  and  we  are  able  to  solve  larger  instances  of  the  problem. 


N 

n 

m 

W 

\AH^Ar\ 

\L\ 

20 

1681 

1718 

59054 

76945 

246012 

40 

6561 

6636 

234145 

315504 
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60 
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80 
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1277402 
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2885699 

24240011 

N 
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| \zi\\  =  0 

relgap 

A®  || 

20 

20 
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1224 

9e  —  09 

4e 

-  11 

40 

32 

1416 

6156 

6e  —  09 

3e 

-  11 

60 

32 

4937 

14181 
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5e 

-  11 

80 

32 

12133 

25359 

8e  —  09 

4e 

-  11 

120 

29 

36516 

57127 

7e  —  09 

3e 

-  11 

Table  9:  Problem  and  solution  characteristics  for  IN a20 
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The  last  class  of  test  problems  is  taken  from  [AC98].  These  are  problems 
of  the  form  ( D )  with  additional  linear  equality  constraints: 

f  n  1 

min  <  ^  ||zj||  ,  such  that  Aj y  +  Zi  =  c,,  i  —  1, , , ,  ,  n,  and  ETy  =  d  >  , 

(35) 

where  E  E  Wnxl  and  d  E  i.e.  I  is  the  number  of  linear  constraints.  In 
[AC98],  it  is  shown  how  the  l\  penalty  function  approach  makes  it  possible 
to  transform  the  linearly  constrained  problem  to  the  unconstrained  form  ( D ) 
in  Section  1,  and  the  physical  interpretation  and  setup  of  the  test  problems 
are  described.  This  class  of  problems  is  denoted  cLYLS. 
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9 
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12 
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7402 

30 
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17511 
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60 
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99 
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14401 
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40401 
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787580 

9919534 
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3182023 
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N 
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relgap 
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4.4e 
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13 

4 

95 
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-  13 

l.le  — 

13 

30 

16 

31 
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4.0e 

-09 

2.3e 

-  13 

3.4e  — 
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60 

20 
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2878 

7.4e 

-09 

1.3e 

-  13 

7.8e  — 

11 

99 

24 
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1.2e 

-08 

l.Oe 

-  13 

7.0e  — 

13 
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25 

1238 

12311 

1.2e 

-08 

l.Oe 

-  11 

2.0e  — 

13 

201 

24 

6179 

35803 

3.1e 

-08 

l.Oe 

-  13 

5.1e  — 

13 

399* 

35 

34776 

146326 

7.8e 

-  14 

2.0e 

-  13 

6.3e  — 

13 

Table  10:  Problem  and  solution  characteristics  for  cLV13. 

Characteristics  and  results  for  these  constrained  problems  are  seen  in 
Table  10.  In  addition  to  the  number  l  of  linear  constraints,  there  is  a  new 
column,  “constr”,  indicating  the  relative  infeasibility  of  these  constraints 
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measured  by  the  expression 

\ETy-d\ 

Nl  +  l  ' 

For  the  primal  barrier  method  in  [AC98],  the  number  of  iterations  varies 
from  30  (for  N  =  3  and  N  =  12)  to  201  (for  N  =  300).  For  the  primal-dual 
method  the  variation  is  from  11  (for  N  =  3)  to  24  (for  N  =  201)  and  35 
(for  N  =  399).  For  the  case  N  =  201  the  CPU  time  is  reduced  from  36371 
seconds  in  [AC98]  to  6179  using  the  primal-dual  method.  However,  we  can 
do  even  better:  in  the  clJV13  problems  there  is  one  column  that  is  relatively 
dense,  resulting  in  considerable  fill-in  during  the  factorization.  Using  the 
technique  described  in  [And96a]  for  handling  dense  columns  these  problems 
can  be  solved  more  efficiently,  making  it  possible  to  solve  for  larger  values 
of  N.  The  asterisk  in  the  table  indicates  that  the  result  for  N  =  399  was 
obtained  by  this  method.  Using  the  same  technique,  the  case  N  =  201 
required  4293  CPU  seconds,  and  there  were  6367553  nonzero  elements  in 
the  L  factor. 

We  conclude  that  for  nonsmooth  problems  the  primal-dual  method  is  sig¬ 
nificantly  more  efficient  than  the  primal  barrier  method  applied  in  [And96b, 
AC098,  AC98].  The  number  of  iterations  increases  slowly  with  the  size  of 
the  problem.  Finally,  the  primal-dual  method  appears  to  be  less  vulnerable 
to  ill-conditioning  near  the  optimal  solution. 
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